The circular drift-diffusion model (CDDM) is a stochastic sequential sampling model that describes choices and response times observed in tasks with a circular decision space (Smith, 2016). Like many other accumulator models, the CDDM assumes that participants accumulate information in favor of a given response option over time, until a response criteria is met. This is characterized as a random walk that starts at the origin of a circle representing the decision space, and that moves towards its circumference. Once the circumference is reached, the radian corresponding to the point of intersection and the amount of steps it took to get there are indicative of the choice and response times observed.
The CDDM considers four parameters, most of which are illustrated in Fig 1. The parameter not shown is the nondecision time parameter \(\tau\). The remaining parameters are the boundary radius parameter (\(\eta\)) that serves as a response criterion, and a doublet of parameters describing the overall speed and direction of the information accumulation process. These last two parameters can be expressed in cartesian coordinates (i.e., the mean displacement observed on the X and Y coordinates per unit of time \(\{\mu_x, \mu_y\}\)) or polar coordinates (i.e., the average speed and expected direction \(\{\delta,\theta\}\)).
Fig 1. Graphical illustration of the CDDM
## [1] 0.9630938
In this document, we present a comparison between different sampling
algorithms that generate data under the CDDM. We will describe how each
of these algorithms work and highlight how different they are in terms
of the computation time they require and their accuracy. For starters,
we will generate n = 5000 bivariate observations using the
arbitrary set of parameter values shown below:
Each pair of observations is obtained by emulating the random walk process described by the CDDM. We emulate the information accumulation process by iteratively sampling random movements on the X and Y direction using \(\mu_x\) and \(\mu_y\), until the
par(pty="m", mar = c(3, 3, 3, 0))
hist(X.RW$bivariate.data[,2], main = "Response Times", col = col.RW(0.7))The execution of the Random Walk emulator algorithm took approximately 6.6815 seconds.
Fig.2 Visual representation of the Reject sampling algorithm. The blue lines represent the joint density function of response times and choices prescribed by the CDDM. The dots represent the pairs of candidate value generated from the bivariate space of possible RTs and choices, plotted at the height of the random uniform(0, maxDensity) value used as rejection criterion. If the rejection criterion was lower than the density function (i.e., dot falls below density curve), the sample is accepted; if the rejection value surpasses the density (i.e., dot is drawn above), the candidate is rejected. The process of generating, testing and keeping candidates is repeated until the desired sample size is achieved.
The execution of the Rejection sampling algorithm took approximately 2.0254 seconds.
The following Metropolis algorithm uses the density function to generate random observations under the CDDM. Please read the comments I’ve left through the code to get a better idea on how this algorithm works. In order to work, this algorithm only needs a list par that specifies the values to use for each of the four parameters of the model (in either of its parameterizations, using polar or cartesian coordinates).
Fig.2 Visual representation of the Metropolis sampling algorithm. The blue lines represent the joint density function of response times and choices prescribed by the CDDM. The dots represent the pairs of candidate value generated from the bivariate space of possible RTs and choices, plotted at the height of the random uniform(0, maxDensity) value used as rejection criterion. If the rejection criterion was lower than the density function (i.e., dot falls below density curve), the sample is accepted; if the rejection value surpasses the density (i.e., dot is drawn above), the candidate is rejected. The process of generating, testing and keeping candidates is repeated until the desired sample size is achieved.
The execution of the Metropolis sampling algorithm took approximately 1.2312 seconds.
par(pty="m", mar = c(3, 3, 3, 0))
hist(X.invCDF[,2], main = "Response Times", col = col.invCDF(0.7))The execution of the inverse-CDF algorithm took approximately 10.5781 seconds.
Future work:
# Load file containing custom eCDF function
source("./code/general_functions/eCDF.R")
RW.eCDF <- myECDF(X.RW$bivariate.data)
Reject.eCDF <- myECDF(X.Reject)
Metro.eCDF <- myECDF(X.Metro)
invCDF.eCDF <- myECDF(X.invCDF)RW.tCDF <- pCDDM(X.RW$bivariate.data, par$drift, par$theta, par$tzero, par$boundary)
Reject.tCDF <- pCDDM(X.Reject, par$drift, par$theta, par$tzero, par$boundary)
Metro.tCDF <- pCDDM(X.Metro, par$drift, par$theta, par$tzero, par$boundary)
invCDF.tCDF <- pCDDM(X.invCDF, par$drift, par$theta, par$tzero, par$boundary)Obtaining the approximate CDF for the data generated with each sampling algorithm took between 0.5305 and 0.6849 seconds.
# We build a simple function to compare these CDFs
getDifferences <- function(eCDFs,tCDFs){
difference <- tCDFs - eCDFs
difference.sum <- apply(difference,2,sum)
KS_statistic <- apply(abs(difference),2,max) # Kolmogorov–Smirnov statistic
sq.difference <- apply(difference^2,2,sum)
output <- cbind(difference.sum, KS_statistic, sq.difference)
colnames(output) <- c("sumDiff","KS-statistic","SSDiff")
rownames(output) <- sub("\\..*", "", colnames(eCDFs))
return(output)
}eCDFs <- cbind(RW.eCDF, Reject.eCDF, Metro.eCDF, invCDF.eCDF)
tCDFs <- cbind(RW.tCDF, Reject.tCDF, Metro.tCDF, invCDF.tCDF)
getDifferences(eCDFs,tCDFs)## sumDiff KS-statistic SSDiff
## RW 44.826836 0.04863066 0.8206141
## Reject 7.175864 0.02242173 0.1539926
## Metro 111.767123 0.12652307 10.8697220
## invCDF 1506.522934 0.59990840 583.5107215
compareTime <- function(n.Samples, n.Datasets, par){
empty_matrix <- matrix(NA, nrow=n.Datasets, ncol=length(n.Samples))
times.RW = times.Reject = times.Metro = times.invCDF = empty_matrix
seed <- 1
for(m in 1:n.Datasets){
set.seed(seed)
i <- 1
for(n in n.Samples){
start_time <- Sys.time()
x <- sample.RW.cddm(n,par)
end_time <- Sys.time()
times.RW[m,i] <- round(end_time-start_time,4)
start_time <- Sys.time()
sample.Reject.cddm(n,par,plot=FALSE)
end_time <- Sys.time()
times.Reject[m,i] <- round(end_time-start_time, 4)
start_time <- Sys.time()
sample.Metropolis.cddm(n,par,plot=FALSE)
end_time <- Sys.time()
times.Metro[m,i] <- round(end_time-start_time,4)
start_time <- Sys.time()
sample.invCDF.cddm(n,par,plot=FALSE)
end_time <- Sys.time()
times.invCDF[m,i] <- round(end_time-start_time,4)
i <- i+1; seed <- seed+1
}
}
return(list("times.RW" = times.RW,
"times.Reject" = times.Reject,
"times.Metro" = times.Metro,
"times.invCDF" = times.invCDF))
}Let’s briefly explore the results we get when we try a second set of parameter values where we have a much larger response boundary (and a rather slower random walk process).
# Arbitrary set of parameter values
par2 <- list("drift" = 1,
"theta" = pi,
"tzero" = 0.1,
"boundary" = 4)
n <- 5000## sumDiff KS-statistic SSDiff
## RW 9.672372 0.01887760 0.1068935
## Reject -1.080100 0.01831517 0.1151862
## Metro -77.308784 0.07093669 2.8618044
## invCDF 1486.107085 0.59196258 582.4931697